I. Introduction

In this project, our group sought to explore and analyze the sub-set of 311 calls dealing with animal-based complaints. We were surprised at the sheer volume of animal-related inquiries - over the 6 year period available from the 311 website, we extracted nearly 250,000 complaints. To gather this very specific sub-set of data, we used a keyword search appraoch, using both generic and specific animal-related terms, to gather as many complaints as possible. Specifically, here are some of the keywords we used:‘animal’,‘animals’,‘dog’,‘dogs’,‘pet’,‘wildlife’,‘infestation’,‘bird’,‘birds’,‘pigeon’,‘pigeons’,‘pest’,‘bee’. Once we had this data collected, we dove in individually and discovered a series of interesting stories, which we present in the sections below.

II. Exploratory Analysis

We explored the 311 calls about animals from several perspectives. We studied the geographic distributions of some illegal pets, different complaint patterns by agency and by time. Here is a simple summary dashboard for all the animal-related 311 calls since 2010.

In the following plot, we show all the animal related complaints. There are many knowledge we can get from the plot. The leftmost column show all the complaints types related to animals. The middle column shows the agencies. The rightmost column shows the boroughs. The width of each line shows the size of count. From the plot, we can know which type of complaints is handled by which agencies and how many complaints were handled. We can also know for each agency the amount of complaints from each borough.

2.1 Unusual Pet Complaints

While investigating the different animal related complaints in the dataset we found those related to “unusual”, “exotic” or “illegal” pet complaints to be among the most interesting calls. Unfortunately the current 311 website does not provide too much details on the calls and the specifics of what was being reported, however we found out that an archived earlier version of this dataset on which calls were detailed, so finding it interesting and relevant for our analysis we decided to scrape it.

The archived file is on JSON format, and can be accessed on the web on this link. To process it and convert it into a standard table format that could be consumed by R’s mapping libraries we utilized Python’s Pandas and NLTK libraries. An iPython notebook with the complete scripts can be found as an attachment, and here we summarize the steps we took in order to extract and map the most relevant information.

  1. We noticed that the JSON was not correctly formatted, as it had some unmatched curly brackets and deeply nested objects with no valuable information. As a first step we cleaned up the file, so that it could be consumed by Pandas.
  2. Once the file was in a clean and tidy format, we proceeded to extract relevant information such as the call content, its latitude, longitude, address, locality and time when it was reported.
  3. Next we used NLTK to do some text mining on the call contents. We removed stopwords and tokenized the words so we could extract relevant entities such as the name of the animal being reported (i.e. pig, rooster, etc.).
  4. After this the data was exported into a CSV and loaded into R for further analysis.

Below we present some summary statistics related to the nature of complaints in all New York and their distribution per borough / locality. The label pet refers to other less represented animals (including examples such as rats, kangaroos, wolf-dogs and bobcats).

We can see that in all 4 boroughs (data from Queens was missing) the most common exotic pet are roosters and chickens. These also happen to be the most annoying or frightening animals, which could be an alternative explanation of why they are reported so much. Exotic birds, on the other hand, might be considered illegal, but people might not bother to report them as they are not a threat to them.

No we proceed to visualize the distribution of the reports for the four most common exotic pet types throughout New York city.

The maps reveal some interesting patterns, as the distribution of the illegal pets is not the same across the city. Snakes tend to be concentrated on the center of Brooklyn, while exotic birds are in the north area. Roosters are apparently popular all across this boroughs and the Bronx, while pigs are prevalent on the southern Brooklyn area. As the earlier plots revealed, Manhattan and Staten Island do not show any particular concentration mainly due to the low number of reports on these areas.

Finally, we decided to create an interactive explorer for unusual pet complaints, where the user is able to navigate through New York city and identify complaints as animal markers. The specific call content is displayed when the user clicks on the icon. To visit the interactive app, click on the link below.

2.2 Agency

Which agencies respond to the animal-related 311 calls most frequently?

From the bar chart, we see that the Housing of Preservation and Development (HPD) receives the most calls, followed by the Department of Sanitation (DSNY) and the Department of Environmental Protection (DEP).

Next, we explore the spatial distribution of the animal-related agency responses.

We used ggplot for these borough-agency maps, removing the default background and grid lines for a simple but bold aesthetic. They were not providing any additional information to the visualization, and we followed one of Edward Tufte’s main data visualization principles of eliminating “chart junk.”

Some key takeaways:

  • HPD is the most frequently responding agency overall, and these responses are clustered toward the Northeast part of Staten Island, rather than being spread out over the whole borough.

  • Plotting the complaints to which the DOT responded form the pattern of NYC’s main roads.

  • For the DEP plot, we expected agency responses to be in parks. But in fact, there is still an empty space over the parks, such as Central Park and Prospect Park. Perhaps this says something about a difference between where people issue their 311 complaints and the occurrence of the complaint cause, or the capabilities of the park rangers to handle animal issues.

2.3 Time

From the yearly complaints plot, we can see that the number of complaints has a sharp increse after 2014. The reason is that Pest is added to complaints only after 2014 and soon it become the top reason of animal complaints.
Next, we draw a calendar plot of 2015 to see the trend of animal complaints throughout a year. It shows that during summer months, the number of animal complaints is greater than other seasons.

We can see that the color of Nov. 13th on the calendar plot is the darkest which means it has the most number of complaints in 2015. We then examine closely about what type of complaints were made in that day. We can see that Pests is the overwhelming descriptor of complaints.

We want to know whether the large number of Pests complaints has some kind of geographical features. It turns out zipcode 10452 make up large percentage of the pests complaints of that day. It may sugggest that lots of people complaint about the same situation at the same day or same group of people make multiple complaints.

III. A Case Study – Pigeons

Of all the animal species found in the city, perhaps none is more often associated with New York City than Columba livia, the common pigeon. These birds seemingly inhabit every square foot of the city, from parks, to statues, to windowsills. While this ability to make a home in any environment is remarkable from an ornithological and evolutionary standpoint, from a residential standpoint, it is nothing but bad news.

When pigeons congregate in a certain location, they are very difficult to remove, and the longer they remain in once place, the dirtier they tend to make it.

Then, it comes as no surprise that from 2010 through 2016, there have been nearly 4,000 complaints dealing with either “pigeon waste” or “pigeon odor” on or around private property.

Here is how these calls have been distributed over time:

What is perhaps most interesting about these pigeon complaints is that they seem to confirm what we already suspect - that these birds do not discriminate where they live. Indeed, as the animated map (click on thumbnail to view CartoDB animation) shows, the spatial distribution of pigeon-related 311 calls is quite random, and seemingly uniform across all areas of New York City.

IV. Correlation and Predictive Analysis

4.1 Relationship between Rent Price and Dog/Pests Complaints

The first thing we want to discover is the correlation between the house rent and the dog’s noise complaints. We first plot the house rent of New York City. The darker the color the higher the rent. As we can see the rent of upper east region of Manhattan is very high. The rent of house in Bronx relatively low. Then we plot the heat map of dog’s noise complaints of Manhattan and also heat map of pest complaints of whole New York City.

As we can see from the heat map of dog’s noise complaints below. The density of complaints is very high in the upper east region. This leads to teh conclusion that there is a positive relation between number of dog’s noise complaints and house rent.

From the heat map of the pests complaints below, we can find the density of complaints is very high in Harlem and Bronx. This leads to the conclusion that there is negative relation between the number of complaints about pests and house rent.

4.2 Predictive Analysis on Case Length – Decision Trees

After exploring different features of the 311 calls, we would like to build up a predictive model to predict how long a case may take based on animal type, season, borough, agency, etc.

We created a new variable indicating what kind of animal was mentioned in each case. Before digging into the model, we first looked at the features of complaints related to different animals and their relationship between other key variables. The following two circos plot show the relationship between animal types and the season, as well as the relationship between animal types and the county (borough). Please note that we did not include dogs and pests in these plots. One reason is that they make up a very big part of all animal-related cases. Including them would make the other animals’ portion invisible. The other reason is that dog and pest complaints are almost equally spread across all seasons and boroughs. Besides, they are analyzed in detail in the previous part.

From the above two graphs we can observe some interesting facts. For example, in the left graph, we can see that bees-related complaints seldom happen in winter. Most of them are during summer. This is align with the fact that bees are more active in spring and summer, hence people complain more about illegal bees/beekeepers during these seasons. Another interesting fact from the right graph is that animal-related complaints in Manhattan are almost all about wildlife and pigeons. On the other hand, the complaint types in the other boroughs are more diverse.

Next we would like to fit several decision trees based on some of the key variables from the 311 complaints. We picked the case length as our dependent variable. However, as we would like to make the model a classification model, we created a ‘CaseLength’ variable with 6 categories: Same Day, In 3 Days, In a Week, In 2 Weeks, In a Month, Over a Month.

The following graphs show several decision trees with different combinations between the key independent variables. The color of the labels are ranged from dark green to dark red, indicating the time from short to long. The bandwidth of the edges represent the proportion of cases falling into that category.

The first Decision Tree takes Animal as the only predictor. We can see that if the case is about wildlife, then it will be solved within a day; if the case is about cats and dogs, it will take shorter time (3 days) too; however, when the animal involved is a bit unusual, the case will take about a month to close.

The second Decision Tree takes animal and borough into consideration. We can also observe similar pattern as the above: cases related wildlife, cats and dogs will take less time, while the others will take longer. Also, we can see that cases in Bronx and Staten Island are predicted to take longer time.

The third Decision Tree studies animal and seasons. We can clearly see in this tree that winter cases tend to have longer processing time. It is reasonable to assume so because the holiday season and bad weather conditions.

The fourth Decision Tree shows the predictions based on animal and agency. It is very clear that DOB, DOHMH and HPD generally take longer time to close cases. This can be related to the nature of the kind of cases they are dealing with.

Besides, we also made some other decision tree models. They are shown in the Appendix.

We also tried including more than 2 variables. They were definitely going to be good to increase the prediction accuracy. However, the tree was very large and was hard to be clearly visualized here.

So from the above Decision Tree graphs, we know that with the information about the animal type, borough, agency and seasons, we can have a general idea about how long a case may take.

V. Conclusion

Text Text Text Text Text Text Text Text Text Text

Appendix

A.1 Additional Decision Trees